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OO I Motivated by the complexity of solving convex scenario problems in one-shot, in this paper we 

provide a direct connection between this approach and sequential randomized methods. A rigorous 

analysis of the theoretical properties of two new algorithms, for full constraint satisfaction and partial 



C/2 ' constraint satisfaction, is provided. These algorithms allow to enlarge the applicability of scenario-based 

O 

methods to real-world applications involving a large number of design variables. Extensive numerical 

simulations for a non-trivial application regarding hard- disk drive servo design testify the goodness of 

> 

^\\ . the proposed solution. 
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—f.' ■ I. Introduction 

O 

m ' In recent years, research on randomized and probabilistic methods for control of uncertain 

systems has successfully evolved along various directions, see e.g. [[T9l for an overview of 
^ I the state of the art on this topic. In particular, different approaches and techniques have been 

_cp . developed and tested in several applications, see e.g. [|91. For convex control design, two main 

classes of algorithms, sequential and non-sequential, have been proposed in the literature, and 

their theoretical properties have been rigorously studied. 

Regarding non-sequential methods, the approach that has emerged is the so-called scenario 

approach, which has been introduced in [|6l, DTI. Taking random samples of the uncertainty 
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q E Q, the main idea of this particular line of research is to reformulate a semi-infinite convex 
optimization problem as a sampled optimization problem subject to a finite number of random 
constraints. Then, a key problem is to determine the sample complexity, i.e. the number of random 
constraints that should be generated, so that the so-called probability of violation is smaller than 
a given accuracy e G (0, 1), and this event holds with a suitably large confidence 1 — 5 G (0, 1). 
A very nice feature of the scenario approach is that the sample complexity is determined a priori, 
that is before the sampled optimization problem is solved, and it depends only on the number 
of design parameters, accuracy and confidence. On the other hand, if accuracy and confidence 
are very small, and the number of design parameters is large, then the sample complexity may 
be huge, and the sampled convex optimization problem cannot be easily solved in practice. 

A parallel line of research, mainly focused on deriving sequential methods for feasibility, has 
been developed for various specific control problems, which include linear quadratic regulators, 
linear matrix inequalities and switched systems as particular cases of a general framework, 
based on various update rules and probabilistic oracles, presented in ^, |fT9l . The main idea of 
these sequential methods is to introduce the concept of validation samples. That is, at step k of 
the sequential algorithm, a "temporary solution" is constructed and, using a suitably generated 
validation sample set, it is verified whether or not the probability of violation corresponding to 
the temporary solution is smaller than a given accuracy e, and this event holds with confidence 
1 — 5. To study the properties of these algorithms, the sample complexity of the validation 
set should be derived, but in this case, unlike the scenario approach, the sample complexity 
is a random variable which cannot be derived a priori. Due to their sequential nature, these 
algorithms might have wider applications than the scenario approach, in particular in real-world 
problems where fast computations are needed because of very stringent time requirements due to 
on-line implementations. However, at present, most sequential algorithms studied in the literature 
are limited to probabilistic feasibility problems. One of the exceptions is the method based on 
stochastic bisection proposed in [1211 . A general framework for nonconvex problems is introduced 
in [[ll, where the class of sequential probabilistic validation (SPV) algorithms is studied. 

In this paper, which is an expanded version of [112], we study two new sequential algorithms for 
optimization with full constraint satisfaction and partial constraint satisfaction, respectively, and 
we provide a rigorous analysis of their theoretical properties regarding the probability of violation. 
These algorithms fall into the class of SPV algorithms, but exploit specific convexity and finite 
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convergence properties of scenario methods, thus showing computational improvements upon 
those presented in [[0, see Section ITlI-AI In particular, the sample complexity of both algorithms 
is derived and it enters directly into the validation step. The sample complexity increases very 
mildly with probabilistic accuracy, confidence and number of design parameters, and depends on 
a termination parameter which is chosen by the user. In the worst case, an optimization problem 
having the same size of the scenario approach should be solved. 

In the second part of the paper, using a non-trivial example regarding the position control of 
read/write head in a commercial hard disk drive, we provide extensive numerical simulations 
which compare upfront the sample complexity of the scenario approach with the number of 
iterations required in the two sequential algorithms previously introduced. We remark again 
that the sample complexity of the scenario approach is computed a priori, while for sequential 
algorithms, the numerical results regarding the size of the validation sample set are random. 
For this reason, mean values, standard deviation and other related parameters are experimentally 
computed for both proposed algorithms by means of extensive Monte Carlo simulations. 

II. Problem Formulation and Preliminaries 
An uncertain convex problem has the form 

minimize c^6 (1) 

See 

subject to f{9, q) <0 for all g G Q 

where 6* G C M"" is the vector of optimization variables and q E Q cMf denotes the vector of 
uncertain parameters bounded in the set Q, f(9,q) : 6 x Q — )• M is convex in 9 for any fixed value 
of g G Q and 6 is a convex and closed set. We note that most uncertain convex problems can be 
reformulated as (H). In particular, multiple scalar-valued constraints fi(9, q) < 0, i = I, . . . , m 
can always be recast into the form ([T) by defining f(9, q) = max fi(9, q). 

In this paper, we study a probabilistic framework in which the uncertainty vector q is assumed 
to be a random variable and the constraint in ([T} is allowed to be violated for some g G Q, 
provided that the rate of violation is sufficiently small. This concept is formally expressed using 
the notion of "probability of violation". 

Definition 1 (Probability of Violation): The probability of violation of 9 for the function / : 
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X Q ^ M is defined as 

V{e) = PT{qeQ : /(^,g)>0}. (2) 

The exact computation of V{9) is in general very difficult since it requires the computation of 
multiple integrals associated to the probability in <^. However, this probability can be estimated 
using randomization. To this end, assume a probability measure is given over the set Q, we 
generate N independent identically distributed (i.i.d.) samples within the set Q 

q = {gW,...,gW}eQ^ 

based on the given density function, where Q^ = Q x Q x ■ ■ ■ x Q (A^ times). Next, a Monte 
Carlo approach is employed to obtain the so called "empirical violation" which is introduced in 
the following definition. 

Definition 2 (Empirical Violation): For given 9 E Q the empirical violation of f{9,q) with 
respect to the multisample q = {q^^\ . . . , g*^^^} is defined as 

1 ^ 
V{e,q) = -J2hi0,q^'^) (3) 



N 

i=l 



iffi9,q)<0 



where 1/(5, g*^*^) is an indicator function defined as If{9,q^'^^] 

1 otherwise 

It is clear that, based on the definition of 1/(6*, g*^*^), the empirical violation is a random variable 
bounded in the closed interval [0, 1]. 

A. The Scenario Approach 

In this subsection, we briefly recall the so-called scenario approach, also known as random 
convex programs, which was first introduced in [l6], [0, see also [TTOl for additional results. In 
this approach, a set of independent identically distributed random samples of cardinality N is 
extracted from the uncertainty set and the following random convex program is formed 

minimize c^9 (4) 

6*66 

subject to f{9, q^^) < 0, z = 1, . . . , AT. 

The function f{9, q) is convex for fixed g G Q and a further assumption is that the problem © 
attains a unique solution 9^- These assumptions are now formally stated. 
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Assumption 1 (Convexity): C M"* is a convex and closed set and f{9, q) is convex in 9 for 
any fixed value of g G Q. 

Assumption 2 (Uniqueness): If the optimization problem (U) is feasible, it admits a unique 
solution. 

We remark that the uniqueness assumption can be relaxed in most cases by introducing a tie- 
breaking rule (see Section 4.1 of [6J). The probabilistic property of the optimal solution obtained 
from dU is stated in the next lemma; see Theorem 1 in [|71. 

Lemma 1: Let Assumptions [U and |2]hold and let 5, e E (0, 1) and A^ satisfy the following 
inequality 

^Y.{%'(^-^r"<^- (5) 

i=0 ^ ^ 

Then, either the optimization problem dU is infeasible which means the original problem ([T) 

is also infeasible or, if feasible, with probability no smaller than 1 — 5, its optimal solution On 

satisfies the inequality V{6n) < S- 

There are a number of results in the literature for deriving bounds on the smallest sample 

complexity N which satisfies Q. The least conservative one, which is proved in [|3]|, is stated 

here. 

Lemma 2: Let Assumptions [T] and [21 hold. Then, for given e E (0, 1) and S E (0, 1), Lemma 

[U holds if 

N > inf- f ^- ) (In l + {ne- 1) Ina ) . (6) 

a.>ie \a — 1 J \ J 

B. Scenario with Discarded Constraints 

The idea of scenario with discarded constraints [2], [fTTI is to generate A^ i.i.d. samples and 
then purposely discard r < A^ — n^ of them. In other words, we solve the following optimization 
problem 

minimize c^O (7) 

subject to f{e, q^^) < 0, z = 1, . . . , A^ - r. 

The r discarded samples are chosen so that the largest improvement in the optimal objective 
value is achieved. We remark that the optimal strategy to select r discarded samples is a mixed- 
integer optimization problem, which may be hard to solve numerically. The following lemma 
[HI defines the probabilistic properties of the optimal solution obtained from ([7]). 
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Lemma 3: Let Assumptions [U and [2] hold and let 5, e E (0,1), N and r < N — tiq satisfy the 
following inequality 

Then, either the optimization problem (|7]) is infeasible which means the original problem (H) 

is also infeasible or, if feasible, with probability no smaller than 1 — 5, its optimal solution O^^i 

satisfies the inequality V{9n) < e. 

An explicit sample bound N satisfying ([8]) is also reported in [|5l. 

Lemma 4: Let Assumptions [T] and [2] hold. Then, for given e G (0, 1), 5 E (0, 1) and r. Lemma 

ll holds if 

N>-ln- + -{r + ne). (9) 

e e 

The sample bounds ^ and ^ can be very large even for problems with moderate number of 

decision variables. Therefore, the computational complexity of the random convex problems dU 

and ^ might be beyond the capability of the available computational tools. Motivated by this 

limitation, in the next section we propose two sequential randomized algorithms. 

in. The Sequential Randomized Algorithms 

The main philosophy behind the proposed sequential randomized algorithms lies on the fact 
that it is easy from the computational point of view to evaluate a given "candidate solution" 
for a large number of random samples extracted from Q. On the other hand, it is clearly more 
expensive to solve the optimization problems dU or (|7]) when the sample bound A^ is large. The 
sequential randomized algorithms, which are presented next, mitigate the conservativeness of 
the bounds ^ and Q by generating a sequence of "design" sample sets {g^ , . . . , g^ '''} with 
increasing cardinality A^^ which are used in © and (|7]) for solving the optimization problem. 
Parallel "validation" sample sets {qi ' , . . . ,qv } of cardinality M^ are also generated by both 
algorithms in order to check whether the given candidate solution, obtained from solving (HJ 
or dV]), satisfies the desired violation probability. 

The first algorithm is in line with those presented in [8 J and [fTTl, in the sense that it uses a 
similar strategy to validate the candidate solution. However, while these algorithms have been 
designed for feasibility problems, the proposed algorithms deal with optimization problems. 
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More generally, the two presented algorithms fall into the class of general SPY algorithms 
studied in [[U. 

A. Full Constraint Satisfaction 

The first sequential randomized algorithm is presented in Algorithm [H and its theoretical 
properties are stated in the following theorem. 

Theorem 1: Suppose that Assumptions [T] and [2] hold. If at iteration k Algorithm [T] exits with 
a probabilistic solution On^, then it holds that V{9n^) < e with probability no smaller than 1 — 5 

Pr{n^^J<e}>l-<5. 

Proof: See Appendix |Al ■ 

Remark 1 (Optimal Value of a): The sample bound (fTTI) is similar to the one derived in ^ 
Theorem 2] originally proven in [15], and also used in [fTl. However, since we are using a finite 
sunu, thanks to the scenario bound (|6]), we can use a finite hyperharmonic series (also known as 
p-series) instead of the Riemann Zeta function. The Riemann Zeta function does not converge 
when a is smaller than one, while in the presented bound (fTTT) a may be smaller than one, 
which improves the sample complexity in particular for large values of kf. The optimal value 
of a which minimizes the sample bound (fTTI) has been computed using numerical simulations 
for different values of the termination parameter kt. The almost optimal value of a minimizing 
(fTTI) for a wide range of kt is a = 0.1. The bound (fTTI) (for a = 0.1) improves upon the bound 
(17) in [|9||, by 5% to 15% depending on the termination parameter kt. It also improves upon 
the bound in [fTTI . which uses finite sum but in a less effective way. 



B. Partial Constraint Satisfaction 

In the "design" and "validation" steps of Algorithmfll all elements of the design and validation 
sample sets are required to satisfy the constraint in (jT). However, it is sometimes impossible 
finding a solution satisfying the constraint in (fl) for the entire set of uncertainty. In Algorithm |2l 
we consider the scenario design with discarded constraints where we allow a limited number of 
design and validation samples to violate the constraint in ([T]). We now state a theorem explaining 
the theoretical properties of Algorithm |2] 

*See in particular the summation ( I16t in the proof of Theorem [T] 
April 9, 2013 DRAFT 



Algorithm 1 Sequential Randomized Algorithm: Full Constraint Satisfaction 

1) Initialization 

Set the iteration counter to zero (A; = 0). Choose the desired probabilistic levels e, 5 and 
the desired number of iterations kt> 1. 

2) Update 

Set k = k + 1 and N^ = N-^ where N is chosen based on © and \x~\ denotes the 
smallest integer greater than or equal to x. 

3) Design 

• Draw Nk i.i.d. samples q^ = {g^ ■ ■ -Qd '^ J from the uncertainty set Q based on the 
underlying distribution. 

• Solve the following random convex program 

9 m, = arsr minimize c^6 (10) 

see 

subject to fie, qf) < 0, i = l,...,Nk. 

• If the optimization problem ([TO]) is not feasible, the original problem ([T]) is not feasible 
as well. 

• Else if the last iteration is reached (k = kt), 6^^ is a probabilistic solution to (HI) with 
confidence 5 and accuracy e, and Exit. 

• Else, continue to the next step. 

4) Validation 

• Draw 

aln A; + ln(iSfcj(Q;)) + In | 



Mk> 



(11) 



i.i.d. samples q^ = {qi ' . . .qi' '''} from the uncertainty set Q based on the underlying 
distribution. The parameter Sk^{ci) in (fTTl) is a finite hyperharmonic series 5fc((a) = 

Ekt J_ 
fc=l fc°=' 

K IfiONki 5't) ) = for i = 1, . . . , Mfc; then, 9^^ is a probabilistic solution to ([T]) with 
confidence 5 and accuracy e, and Exit. 
Else, goto step ©. 
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Theorem 2: Suppose that Assumptions [H and [2] hold. If at iteration k Algorithm |2] exits with 
a probabilistic solution 9^^, then it holds that V{6n^) < e with probability no smaller than 1 — 5 



Pr{F(^ivJ<^}>l 



Proof: See Appendix |B] ■ 

Remark 2 (Choice of N^J: The cardinality of the design sample set at the last iteration in 
Algorithms [T] and |2] Nk^, is chosen to be exactly equal to the bounds Q and ^ respectively. 
Therefore, the complexity of the last iteration, if it is reached, is exactly equal to that of the 
scenario approach and the scenario with discarded constraints. 

Remark 3 (Value of the Objective): If Algorithms [T] and |2] have a successful exit at iteration 
k < kt, it implies that the number of samples used for design is smaller than the number used in 
the scenario approach or scenario with discarded constraints. Note that the consequent reduction 
in the number of design samples may potentially improve the objective value c^On^., with respect 
to the one obtained by the scenario approach. 

Algorithmic] is different from the algorithm presented in [0, which was derived for non-convex 
problems, in a number of aspects. That is, the cardinality of the sequence of sample sets used for 
design and validation increase linearly with iteration counter k, while they increase exponentially 
in All. Furthermore, the cardinality of the validation sample set at the last iteration M^^ in [|2l is 
chosen to be equal to the cardinality of the sample set used for design at the last iteration N^^ 
while, in the presented algorithm M^j and hence /3w are chosen based on the additive Chemoff 
bound which is less conservative. 

We also note that both Algorithms [T] and |2] falls within the class of SPV algorithms in which 
the "design" and "validation" steps are independent. As a result, in principle we could use the 
same strategy as Algorithm [Tito tackle discarded constraints problems. Nevertheless, Algorithm |2] 
appears to be more suitable for discarded constraints problems, since (fT3l) forces the solution to 
violate some constraints. 

C. Termination Parameter kt 

The termination parameter kt defines the maximum number of iterations of the algorithm 
which can be chosen by the user. We note that the choice of kt directly affects the cardinality of 
the sample sets used for design Nk and validation Mk at each iteration, although they converge 
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Algorithm 2 Sequential Randomized Algorithm: Partial Constraint Satisfac- 
tion 

1) Initialization 

Set the iteration counter to zero (A; = 0). Choose the desired probabilistic levels e, 5, the 
desired number of iterations /cj > 1, the desired number of discarded constraints r and 
define the following parameters: 

/3^ = max<( l,/3^ ( /Cilny j L /3„ = — In-. (12) 



2) Update 



{N-r)k 
kt 



where N is chosen based on ^. 



Setk = k + 1, Nk= Nf- and Nk r 

3) Design 

• Draw Nk i.i.d. samples q^ = {g^ ■ ■ -Qd '' s frorn the uncertainty set Q based on the 
underlying distribution. 

• Solve the following random convex program: 

9 m, =arg minimize c^O (13) 

" eee 

subject to f{e, qf) < 0, « = 1, . . . , A^^,,. 

• If the optimization problem (fTJt is not feasible, the original problem ^ is not feasible 
as well. 

• Else if the last iteration is reached {k = kt), 6^^^ is a probabilistic solution to dU) with 
confidence 5 and accuracy e, and Exit. 

• Else, continue to the next step. 

4) Validation 

• Draw Mk = \2k(3y-\n ^] i.i.d. samples q^, = {qv . . .qi, '''} from the uncertainty set 
Q based on the underlying distribution. 

. If 

rrEl/(^^„gi^^)<(l-(W-^/^)e (14) 



4 = 1 



then, Ojy^ is a probabilistic solution to © with confidence 5 and accuracy e, and Exit. 
. Else, goto step Q. 
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to fixed values (independent of kt) at the last iteration. In problems for which the bounds © 
and dH) are large, we would suggest to use larger kt. Then, the sequence of sample bounds A^^ 
starts from a smaller number and does not increase significantly with the iteration counter k. 
We also remark that the right hand side of the inequality (fT4T i in Algorithmic] cannot be negative 
which in turn requires (3^ to be greater than one. This condition is taken into account in defining 
(3v in (fT2l) . However, we can avoid generating /3y < I hy the appropriate choice of kt. To this 
end, we solve the inequality (3^ > I for kt as follows: 

For implementation purposes, it is useful to use the function "LambertW" also known as "Omega 
function" or "product logarithm'o Then, we solve the previous inequality for kt 

13^ 



kt< 



LambertW (^) ' 



D. Complexity Analysis 

The sample complexity (and computational complexity) of Algorithms [H and |2] is a random 
variable because the number of iterations is random. The sample complexity in which the 
algorithm terminates (A^a: and Mk) is only known a posteriori, while in the scenario approach 
we can establish a priori sample bounds. Taking into account that the asymptotic computational 
complexity of most SDP solvers increase exponentially with the size of the problem under 
consideration [fT6l . we conclude that if Algorithms [I] and l2]exit with smaller number of design 
samples than the bounds ^ and ^, which is the case most of the times, the reduction in the 
number of design samples can significantly improve the computational complexity. For instance, 
asymptotic computational complexity of solving a n x n linear matrix inequality (LMI) with 
m decision variables using the SDP solver SEDUMI is of order 0{m'^n'^'^ + n^'^) lfT6ll . Hence, 
decrease in n, which represents the size of LMI, can significantly reduce the computational 
complexity. We also note that the computational complexity of validation steps in both presented 
algorithms is not significant since they just require analysis of a candidate solution for a number 
of i.i.d. samples extracted from the uncertainty set. 

^This function is the inverse function of f{W) = We^ . In other words, W — LambertW/ (H^); see e.g. (14| for more 
details. The Matlab command is W = lambertw (f (W) ) . 
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TABLE I 
Simulation Results Obtained Using Algorithm[T] 
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TABLE II 

Simulation Results Obtained Using Algorithm[2] 



IV. Application to Hard Disk Drive Servo Design 

In this section, we employ the developed algorithms to solve a non-trivial industrial example. 
The problem under consideration is the design of a robust track following controller for a 
hard disk drive (HDD) servo system affected by parametric uncertainty. Servo system in HDD 
plays a crucial role in increasing the storage capacity by providing a more accurate positioning 
algorithm. The goal in this application is to achieve the storage density of 10 Tera bit per square 
inch (10T6/m^). It requires the variance of the deviation of read/write head from the center 
of a data track to be less than 1.16 nanometer. Such a high performance has to be achieved 
in a robust manner, that is, for all drives produced in a mass production line. On the other 
hand, some imperfections in the production line such as manufacturing tolerances and slightly 
different materials or environmental conditions lead to slightly different dynamics over a batch 
of products. 
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A voice coil motor (VCM) actuator in a disk drive system can be modeled in the form 

PvCM = / ^ n , o/- ] 2 (^^^ 

^ S2 + 2CiUJiS + Ujf 

where Q, oJi and Ai are damping ratio, natural frequency and modal constant for each resonance 
mode, see [fTBll for their nominal values. We assume each natural frequency, damping ratio and 
modal constant to vary by 5%, 5% and 10% from their nominal values respectively. Hence, there 
are nine uncertain parameters in the plant. Since the problem is of regulation type, the sensitivity 
transfer function is of vital importance. In order to shape the sensitivity transfer function, we 
augment the open-loop plant with necessary weighting functions. The objective is to design a 
full order dynamic output feedback controller which minimizes the worst case "Hoc norm of the 
transfer function from disturbance to output. Neglecting the uncertain parameters, this problem 
can be easily reformulated in terms of linear matrix inequalities [18J. Uncertain parameters enter 
into the plant description in a non-affine fashion; therefore, classical robust techniques are unable 
to solve the problem without introducing conservatism. 

The sequential algorithms of Section [III] are implemented in Matlab using the toolbox Random- 
ized Algorithm Control Toolbox (RACT) [|20l . In the simulations, we assumed the probability 
density function of all uncertain parameters to be uniform. The choice of uniform distribution is 
chosen due to its worst case nature [|4J. The number of discarded constraints r in Algorithm |2] is 
chosen to be zero. The resulting optimization problem is solved for different values of e, 6 and 
kt- Furthermore, we run the simulation 100 times for each pair. The mean, standard deviation 
and worst case values of the number of design samples, validation samples, objective value and 
the iteration number in which the algorithm exits are tabulated in Table U and Table UIl The 
scenario bounds are also shown in the same tables for an easy comparison. It is observed that 
using the proposed algorithms, we can achieve the same probabilistic levels with much smaller 
number of design samples. 

V. Conclusions 

We proposed two new sequential methods for solving in a computational efficient way uncer- 
tain convex optimization problems. The main philosophy behind the proposed sequential ran- 
domized algorithms stems from the consideration that it is easy, from a computational viewpoint, 
to validate a given "candidate solution" for a large number of random samples. The algorithms 
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have been tested on a numerical example, and extensive numerical simulations show how the 
total computational effort is "diluted" by applying the proposed sequential methodology. 
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Appendix A 
Proof of the Theorem [T] 

Following the same reasoning as in [fTTl . we introduce the following events 
Iterfc = {the fcth outer iteration is reached}, 
FeaSfc = {9n^^ is declared as feasible in the "validation" step}, 
Badfe = {¥(9^,) > e}, 

ExitBadfc = {Algorithm [T] exits at iteration k fl Bad^}, 
ExitBad = {Algorithm [T] exits at some unspecified iteration /cfl Bad^}. 

The goal is to bound the probability of the event "ExitBad". Since ExitBadj fl ExitBadj = 
for i 7^ j, the probability of the event "ExitBad" can be reformulated in terms of the event 
"ExitBadfc" as 

Pr{ExitBad} = Pr{ExitBadi U ExitBads U ■ ■ ■ U ExitBad^J 

= Pr{ExitBadi} + Pr{ExitBad2} H h Pr{ExitBadfeJ. (16) 

From the definition of the event "ExitBad^" and by considering the point that to exit at iteration 
k. Algorithm \T\ needs to reach fcth iteration and declares 9^^ as feasible, we arrive at 

Pr{ExitBadfc} = Pr{FeaSfc fl Bad^ n Iter^} = Pr{FeaSfc n Bad^ | Iter^} Pr{Iterfe} 

< Pr{FeaSjt fl Bad^ | Iterfc} = Pr{FeaSfc | Bad^ fl Iter^} Pr{Badfc | Iter^} 

< Pr{FeaSfc | Bad^ fl Iterfc}. (17) 
Using the result of Theorem 1 in [|8l, we can bound the right hand side of (fTTl) 

Pr{FeaSfc | Bad^ n Iterfc } < (1 - e)*^^ (18) 
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Combining (fT6l) and (fTSi) results in 

kt 

PrjExitBad} <(1 - e)^' + (1 - e)^^ + ■ ■ ■ + (1 - £)^'=* = ^(1 - e)^K (19) 

fc=i 

The summation in (fT9l) can be made arbitrary small by the appropriate choice of M^. By choosing 

(1-£)A4= 1^_5 (20) 

where 5 E (0, 1) is a (small) desired probability level, we have 

Therefore, the appropriate choice of Mk which guarantees PrjExitBad} < 5 can be computed 
by solving (l20l) for M^ which results in the bound (fTTI) . 

Appendix B 
Proof of the Theorem [2] 

Given N, e E (0, 1), p G [0, 1) and / : 6 x Q — ;■ R, the probability of one-sided constrained 
failure, denoted by pf{N,e,p) is defined as 

Pf{N, 6,p) = Pt I qeQ^ : there exist 6 eQ such that 9(6, q) < p and V{e) - V{e, q) > e 

Denote by 5k the probability of miss-classification at iteration k. Therefore, 

Sk < Pr |q, e Q*^'^ : Vie^^q,) < (l - {kf3,)-'/^) e and V0N,) > e\. 

By defining pk = [l — (A;/?!,)"^/^) £ and Ek = {kjSyy^^'^e, the probability of miss-classification 
can be expressed in terms of the probability of one-sided constrained failure 

Sk < Pr |q, e Q^'" : V(9^^,q,) < pu and 1^(^^J - V(eN,.q,) > Su 

Using Theorem 1 in [2], it follows that 

5, < Pr [q e Q^^ : XMLl^M^J,^ > -^^=\ . (21) 

For fixed 6, e and any / : x Q — )• M, one-sided multiplicative Chemoff inequality is defined 
as 

Vi{V{d) - V{d,q) > eV{d)} < e ^ — . (22) 
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Letting e to be -^^^, combining inequalities (|2T)) and (|22)) . and taking natural logarithm of 
both sides, we obtain 

In 4 < ^^ = dm:^,,pi ,n ^ ^ i„ i. 

2(£fc + Pfc) 26 e 6 kt 

Therefore, taking into account that the probability of miss-classification of the algorithm is the 

summation of the probability of miss-classifications at each iterations {6k), we arrive at 

^^ ^ ^^ kt kt 
k=i k=i 

which proves the statement. 
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